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Abstract 

We consider the problem of lossless compression of binary trees, with the aim of reducing 
the number of code bits needed to store or transmit such trees. A lossless grammar-based code is 
presented which encodes each binary tree into a binary codeword in two steps. In the first step, 
the tree is transformed into a context-free grammar from which the tree can be reconstructed. In 
the second step, the context-free grammar is encoded into a binary codeword. The decoder of the 
grammar-based code decodes the original tree from its codeword by reversing the two encoding 
steps. It is shown that the resulting grammar-based binary tree compression code is a universal code 
on a family of probabilistic binary tree source models satisfying certain weak restrictions. 

Index Terms 

grammar-based code, binary tree, lossless compression, context-free grammar, minimal DAG 
representation. 

I. Introduction 

There have been some recent initial attempts to conceptualize the notion of structure in 
information theory [fT2l [|3l [iT6l . with the ultimate future goal being the development of a 
lossless compression theory for structures. In the present paper, we put forth a general 
framework for this area, and then develop a lossless compression theory for binary tree 
structures within this framework. Our framework will permit an abstract asymptotic theory 
for the compression of structures to be developed, where the framework is sufficiently general 
to include the types of structures that have been considered in other contexts, such as in the 
asymptotic theory of networks [|13l or the asymptotic theory of patterns [7 1 . The basic concepts 
in this framework are the notions of structure universe, structure filter, and structure source, 
which we now define; after the definitions, we give examples of the concepts relevant for the 
work we shall do in this paper. 

Concept of Structure Universe. Broadly speaking, "structure universe" will mean the set 
of structures under consideration in a particular context. Each structure has a "size" assigned 
to it, which is a positive integer that can be a measure of how large or how complex the 
structure is. For example, if a structure is a finite graph g, then the size of the structure could 
be taken as the number of vertices of g or the number of edges of g; if a structure is a finite 
tree t, then the size of the structure could be taken as the number of leaves of t. We now 
make the notion of structure universe precise. A structure universe Q. is defined to be any 
countably infinite set such that for each (O G ^ there is defined a positive integer |(o|, which 
we call the size of (O, such that the set {co G Q : |oo| = n} is finite for each positive integer n. 

Concept of Structure Filter A structure filter !f over a structure universe Q. (called f2-filter 
for short) is defined to be any set of finite nonempty subsets of Q. which forms a partition 
of Q.. For example, given any structure universe Q., we have the natural Q-filter consisting 
of all nonempty subsets of Q. of the form {oo G Q : |a)| = n} (n = 1,2, ■ ■ ■)■ Given an Q-filter 
^, a real- valued function {xf : F E !f) defined on !f, and an extended real number L, the 
limit statement liniF^^tfXf = L means that for any neighborhood lA^ of L in the topology of 
the extended real line, the set {F E !f : Xf ^ 9\C} is finite; the limit L, if it exists, is unique, 
which is due to the fact that a structure filter is always countably infinite. Similarly, one can 



make sense of limit statements of the form limsup^^^JCf = L and liminfpg^x/? = L. The 
sets in any Q-fiker ^ are growing in the sense that 

lim [minjlcol : COGFJI =00. (1.1) 

This condition will make possible an asymptotic theory of lossless compression of structures; 
we will see how the condition is used in Sec. III. 

Concept of Structure Source. Informally, suppose we randomly select a structure from each 
element of a structure filter; then these random structures constitute the output of a structure 
source. Formally, we define a structure source to be any triple (Q,^,/') in which ^ is a 
structure universe, ^ is an ^-filter, and P is a function from Q. into [0, 1] such that 

j;/'(a)) = l, FG^. (1.2) 

coeF 

Note that (11.21 ) simply tells us that P restricted to each F E 'F yields a probability distribution 
on F; for any subset F' of F, we write the probability of F' under this distribution as P{F'), 
which is computed as the sum LcoeF'^(tt)). 

Example 1. For each n>2, fix an undirected graph g^ with n vertices and n{n— l)/2 
edges, one edge for each pair of distinct vertices, and let G„ be the set of edge-labelings of 
gn in which each edge of g„ is assigned a label from the set {0, 1}. That is, G„ consists of 
all pairs (gmOL) in which a is a mapping from the set of edges of gn into the set {0, 1}. Let 
G* be a subset of G„ such that for each {gn.C/i) E Gn, there exists a unique (g„,a*) G G* into 
which (gmCc) is carried by an isomorphism (that is, there is an isomorphism of ^„ onto itself 
which carries each edge e of gn into an edge e' of gn for which the edge labels a(e),a*(e') 
coincide). For example, G3 consists of four edge labelings of gj, one in which all three of 
the edges of ^3 are labeled 0, a second one in which all edge labels are 1, a third one in 
which two edge labels are and the remaining one is 1, and a fourth one in which two edge 
labels are 1 and the remaining one is 0. Let Q. be the structure universe U„>2G*, where we 
define the size of each labeled graph in Q. to be the number of vertices of the graph. Let ^ 
be the ^-filter {G* : n>2}. For each o G (0, 1), let Sa = (^, !F,Pa) be the structure source 
such that for each (g„,a') G Q., 

/'oU„,a')=A^U.,a')o'"i(l-or, 

where mq is the number of edges of gn assigned a'-label 0, mi is the number of edges 
of gn assigned a'-label 1, and N{gn,OL') is the number of {gn.OL) belonging to G„ for which 
{gn, a*) = {gn, Ol'). For example, the Pa probabilities assigned to the four structures in G3 given 
above are o^,(l — o)-^, 3o^(l — o), and 3a(l — o)^, respectively. In random graph theory, the 
structure source Sa is called the Gilbert model [6J. Choi and Szpankowski [3] addressed the 
universal coding problem for the parametric family of sources {Sa : < o < 1}. (We discuss 
universal coding for general structure sources after the next two examples.) 

Example 2. We consider finite rooted binary trees having at least two leaves such that each 
non-leaf vertex has exactly two ordered children. From now on, the terminology "binary tree" 
without further qualification will automatically mean such a tree. Let T be a set of binary 
trees such that each binary tree is isomorphic as an ordered tree to a unique tree in T. Then 
T is a structure universe, where the size \t\ of a tree t in the universe T is taken to be 



the number of leaves of t. We discuss two ways in which 'T can be partitioned to obtain a 
T-filter. For each n>2, let Tn be the set of trees in T that have n leaves. For each n > 1, 
let T" be the set of trees in T for which the longest root-to-leaf path consists of n edges 
(that is, T" consists of trees of depth n). Then 9^i = {% : n > 2} and J2 = {T" : n > 1} 
are each T-filters. A structure source of the form {T,!J^,P) for some T-filter ^ is called a 
binary tree source. In lfT2l . binary tree sources of form (T, !F\,P) were introduced which are 
called leaf-centric binary tree source models; we address the universal coding problem for 
such sources in Section IV of the present paper. In Section V, we address the universal coding 
problem for a type of binary tree source of form (T, ^2,^) which we call a depth-centric 
binary tree source model. 

Example 3. Let A be a finite alphabet. For each n > 1, let A" be the set of all n-tuples of 
entries from A. Then Q. = W^^^A" is a structure universe, where we define the size of each 
structure in A" to be n. Let ^ be the ^-filter {A" : n> 1}. A structure source of the form 
{Q.,!J^,P) corresponds to the classical notion of finite- alphabet information source ([Tj, page 
14) . Thus, source coding theory for structure sources will include classical finite- alphabet 
source coding theory as a special case. 

Asymptotically Optimal Codes for Structure Sources. In the following and in the rest of 
the paper, 'B denotes the set of non-empty finite-length binary strings, and L[b] denotes the 
length of string b E "B. Let Q be a structure universe. A lossless code on ^ is a pair (\|/e,\|/rf) 
in which 

• \\fe (called the encoding map) is a one-to-one mapping of Q. into B which obeys the 
prefix condition, that is, if coi and (02 are two distinct structures in Q., then \|/e((Oi) is 
not a prefix of \|/e((02); and 

• \\fd (called the decoding map) is the mapping from \|/e(Q) onto Q. which is the inverse 

of \|/e. 

Given a lossless code {\\fe,\\fd) on structure universe ^ and a structure source {Q.,!f,P), then 
for each F G iT we define the real number 

Ri^\fe,F,P) = £ |a)ri{L[(^,((o)] +log2/'((o)}/'(co), 
(oeF,P((n)>0 

which is called the F-th order average redundancy of the code (\|/e,\|/^) with respect to the 
source. We say that a lossless code (v|/e,V|/rf) on Q. is an asymptotically optimal code for 
structure source {Q.,'f,P) if 

limi?(\|/e,F,/')=0. (1.3) 

Universal Codes for Structure Source Families. Let ^ be a fixed J^-filter for structure 
universe Q.. Let P be a set of mappings from Q. into [0, 1] such that (11.21 ) holds for every 
P G P. A universal code for the family of structure sources {(Q, !f ^P) : P G P} (if it exists) 
is a lossless code on Q. which is asymptotically optimal for every source in the family. The 
universal source coding problem for a family of structure sources is to determine whether the 
family has a universal code, and, if so, specify a particular universal code for the family. 

There has been little previous work on universal coding of structure sources. One notable 
exception is the work of Choi and Szpankowski [|3l, who devised a universal code for the 
parametric family of Gilbert sources {Sc, : < o < 1} introduced in Ex. 1. Peshkin [[T7l 



and Busatto et al. |l2l proposed grammar-based codes for compression of general graphical 
structures and binary tree structures, respectively; as these authors did not use a probabilistic 
structure source model, it is unclear whether their codes are universal in the sense of the 
present paper (instead, they tested performance of their codes on actual structures). 

Context-Free Grammar Background. In the present paper, we further develop the idea 
behind the Busatto et al. code Y2}\ to obtain a grammar-based code for binary trees which, 
under weak conditions, we prove to be a universal code for families of binary tree sources. In 
this Introduction, we describe the structure of our code in general terms; code implementation 
details will be given in Section II. In order to describe the grammar-based nature of our code, 
we need at this point to give some background information concerning deterministic context- 
free grammars. A deterministic context free grammar G is a quadruple {S\,S2,s* ,P) in which 

• 5i is a finite nonempty set whose elements are called the nonterminal variables of G. 

• ^2 is a finite nonempty set whose elements are called the terminal variables of G. (5i U52 
is the complete set of variables of G.) 

• 5* is a designated nonterminal variable called the start variable of G; 

• P is the finite set of production rules of production rules of G. P has the same cardinality 
as 5i. There is exactly one production rule for each nonterminal variable s, which takes 
the form 

S^{suS2,---,Sn), (1.4) 

where n is a positive integer which can depend on the rule and s\,S2,--- ,Sn are variables 

of G. s, {si,---,Sn), and n are respectively called the left member, right member, and 

arity of the rule (|1.4I ). 

Given a deterministic context-free grammar G, there is a unique up to isomorphism rooted 

ordered vertex-labeled tree t{G) (which can be finite or infinite) satisfying the following 

properties: 

• The label on the root vertex of t{G) is the start variable of G. 

• The label on each non-leaf vertex of ?(G) is a nonterminal variable of G. 

• The label on each leaf vertex of t{G) is a terminal variable of G. 

• Let s{v) be the variable of G which is the label on each vertex v of t{G). For each 
non-leaf vertex v of f(G) and its ordered children vi,V2, ■ ■ ■ , v„, 

^(v) -^ {s{vi),s{y2),---,s{yn)) 

is a production rule of G. 
"Unique up to isomorphism" means that for any two such rooted ordered trees there is an 
isomorphism between the trees as ordered trees that preserves the labeling (that is, corre- 
sponding vertices under the isomorphism have the same label). We call r(G) the derivation 
tree of G. 

Outline of Binary Tree Compression Code. Let T be the structure universe of binary trees 
introduced in Ex. 2. Suppose t ^"T and suppose G is a deterministic context-free grammar 
such that the arity of each production rule is two. Then we say that G forms a representation 
oi t li t is the unique tree in CT isomorphic as an ordered tree to the tree which results when 
all vertex labels on the derivation tree of Gf are removed. In Section II, we will assign to each 
? G T a particular deterministic context-free grammar G^ which forms a representation of t. 



Then we will assign to G/ a binary codeword 5(G?) so that the prefix condition is satisfied. 
The grammar-based binary tree code of this paper is then the lossless code {(^eAd) on T in 
which the encoding map (^^ and decoding map <^d each operate in two steps as follows. 

• Encoding Step 1: Given binary tree t E T, obtain the context-free grammar G/ from t. 

• Encoding Step 2: Assign to grammar G? the binary word 5(Gf) G 'B, and then B{Gt) 
is the codeword (|)e(0 for ^• 

• Decoding Step 1: The grammar G/ is obtained from 5(G/), which is the inverse of the 
second encoding step. 

• Decoding Step 2: G? is used to obtain the derivation tree of Gj , from which t is obtained 
by removing all labels. 

The two-step encoding/decoding maps (^g and (|)j are depicted schematically in the following 
diagrams: 



„ ,. , , , ,-^ 1st step _ 2nd step „,^ , , / n ^ 

Encodmg Map (^e ■ teT — >^ Gt — > B{Gt) = 4>e(0 ^ ® 



Decoding Map (^^ : 5(G,) ^^^ G, ^"^'^ t = (^d{B{Gt)) 



We point out the parallel between the grammar-based binary tree compression algorithm 
of this paper and the grammar-based lossless data compression methodology for data strings 
presented in [ITOl . In the grammar-based approach to compression of a data string x, one 
transforms x into a deterministic context-free grammar G^ from which x is uniquely recover- 
able as the sequence of labels on the leaves of the derivation tree of G^; one then compresses 
Gx instead of x itself. Similarly, in the grammar-based approach to binary tree compression 
presented here, one transforms a binary tree t into the deterministic context-free grammar G? 
from which t is uniquely recoverable by stripping all labels from the derivation tree of G^; 
one then compresses Gt instead of t itself. 

The rest of the paper is laid out as follows. In Sec. II, we present the implementation 
details of the grammar-based binary tree compression code {^eAd)- In Sec. Ill, we present 
some weak conditions on a binary tree source under which (^eAd) will be an asymptotically 
optimal code for the source. The remaining sections exploit these conditions to arrive at wide 
families of binary tree sources on which {<^eAd) is a universal code (families of leaf-centric 
models in Sec. IV and families of depth-centric models in Sec. V). 

II. Implementation of Binary Tree Compression Code 

This section is organized as follows. In Section II-A, we give some background regarding 
binary trees that shall be used in the rest of the paper. Then, in Sec II-B, we explain how to 
transform each binary tree teT into the deterministic context-free grammar G^; this is Step 
1 of encoding map ^g- In Section II-C, there follows an explanation on how the codeword 
B{Gt) is obtained from Gt', this is Step 2 of encoding map (|)e. Examples illustrating the 
workings of the encoding map (|)e and the decoding map (^^ are presented in Section II-D. 
Theorem 1 is then presented in Section II-E, which gives a performance bound for the code 
i'^eAd)- Finally, in Section II-F, we discuss a sense in which the grammar G? is minimal and 
unique among all grammars which form a representation of ? G T. 



A. Binary Tree Background 

We take the direction along each edge of a binary tree to be away from the root. The root 
vertex of a binary tree is the unique vertex which is not the child of any other vertex, the leaf 
vertices are the vertices that have no child, and each of the non-leaf vertices has exactly two 
ordered children. We regard a tree consisting of just one vertex to be a binary tree, which 
we call a trivial binary tree; all other binary trees have at least two leaves and are called 
non-trivial. Given a binary tree t, V{t) shall denote the set of its vertices, and V^{t) shall 
denote the set of its non-leaf vertices. Each edge of t is an ordered pair {a,b) of vertices in 
V{t), where a is the vertex at which the edge begins and b is the vertex at which the edge 
ends (a is the parent of b and Z? is a child of a). A path in a binary tree is defined to be 
any sequence (vi, V2, ■ " ' ^Vk) of vertices of length k>2m which each vertex from V2 onward 
is a child of the preceding vertex. For each vertex v of a binary tree which is not the root, 
there is a unique path which starts at the root and ends at v. We define the depth level of 
each non-root vertex v of a binary tree to be one less than the number of vertices in the 
unique path from root to v (this is the number of edges along the path); we define the depth 
level of the root to be zero. Vertex V2 is said to be a descendant of vertex vi if there exists 
a (necessarily unique) path leading from vi to V2. If a binary tree has n leaf vertices, then it 
has n—l non-leaf vertices and therefore 2{n—l) edges. 

We have a locally defined order on each binary tree t in which each sibling pair of child 
vertices of t is ordered. From this locally defined order, one can infer various total orders on 
V{t) which are each consistent with the local orders on the sets of children. The most useful 
of the possible total orders for us will be the breadth-first order. If we list the vertices of a 
binary tree in breadth-first order, we first list the root vertex at depth level 0, then its two 
ordered children at depth level 1, then the vertices at depth level 2, depth level 3, etc. Two 
vertices vi,V2 at depth level j > are consecutive in breadth-first order if and only if either 
(a) vi, V2 have the same parent and vi precedes V2 in the local ordering of children, or (b) the 
parent of vi and the parent of V2 are consecutive in the breadth-first ordering of the non-leaf 
vertices at depth level j —I. It is sometimes convenient to represent a tree t pictorially via a 
"top down" picture, where the root vertex of t appears at the top of the picture (depth level 
0) and edges extend downward in the picture to reach vertices of increasing depth level; the 
vertices at each depth level will appear horizontally in the picture with their left-right order 
corresponding to the breadth-first order. Fig. 1 depicts two binary trees with their vertices 
labeled in breadth-first order. 

The structure universe T consists only of nontrivial binary trees. Sometimes we need to 
consider a trivial binary tree consisting of just one vertex. Fix such a trivial tree t* . Then 
T* = TU {?*} can be taken as our structure universe of binary trees both trivial and nontrivial. 
For each n > 1, letting Tn be the set of trees in T* having n leaves, and letting Kn be the 
cardinality of "T^, it is well known [,18 J that {Kn : n > 1} is the Catalan sequence, expressible 
by the formula 

ui{n-r 

n\ n — l 



Kn = -{ ' /], n>l. 



For example, using this formula, we have 

Ki=K2=h K3 = 2, K4 = 5, K5 = 14. 



Fig. 1 depicts one of the (1/8) (7"^) = 429 binary trees in %, and one of the (1/16) (^g) 
9,694,845 binary trees in %(,. 
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Fig. 1: Binary trees in Ig (left) and 116 (right) with breadth-first ordered vertices 

A subtree of a binary tree ? is a tree whose edges and vertices are edges and vertices of 
t\ by convention, we require also that a subtree of a binary tree should be a (nontiivial or 
trivial) binary tree. There are two special types of subtrees of a binary tree that shall be of 
interest to us, namely final subtrees and initial subtrees. Given a binary tree ?, a final subtree 
of ? is a subtree of t whose root is some fixed vertex of t and whose remaining vertices are 
all the descendants of this fixed vertex in ?; an initial subtree of t is any subtree of t whose 
root coincides with the root of t. If t is any nontrivial binary tree and v ^V{t), we define 
t{y) to be the unique binary tree in T* which is isomorphic to the final subtree of t rooted 
at V. Note that t{y) = ?* if v is a leaf of t, and that t{y) = t if t E "T and v is the root of t. 
There are also two other trees of the t{v) type which appear often enough that we give them 
a special name; letting vi,V2 be the ordered children of the root of nontrivial binary tree t, 
we define t^ = t{vi) and tR = t{v2) to reflect the respective left and right positions of these 
trees in the top down pictorial representation of tree t. 



B. Encoding Step 1 

Given ? G T, we explain how to transform t into the grammar Gf, which is Step 1 of the 
encoding map (|)e. Define A'^ = N{t) to be the cardinality of the set {t{v) : v G V{t)}. Note that 
N >2 since t* and t are distinct and both belong to this set. The set of nonterminal variables 
of Gt is the nonempty set of integers {0, 1,- ■■ ,N — 2}. The set of terminal variables of G? 
is the singleton set {T}, where we have denoted the unique terminal variable as the special 
symbol T. The start variable of Gj is 0. All that remains to complete the definition of Gf is to 
specify the production rules of G?. We do this indirectly by first labeling the vertices of Hn a 
certain way and then extracting the production rules from the labeled tree. This labeling takes 
place as follows. The root of t is labeled and each leaf of t is labeled T. The vertices of t 



are traversed in breadth-first order. Whenever a vertex v is thus encountered which as yet has 
no label, one checks to see whether t{v) coincides with t{v') for some previously traversed 
vertex v'. If this is the case, v is assigned the same label as v'; otherwise, v is assigned label 
equal to the smallest member of the set {0, 1 ,■■■,// — 2} which has so far not been used as a 
label. For each nonterminal variable i E {0, 1 , ■ ■ ■ , A^ — 2}, we can then extract from the labeled 
tree the unique production rule of G^ of form i — )> {11,12} by finding any vertex of the labeled 
tree whose label is i; the entries /i,Z2 are then the respective labels on the ordered children 
of this vertex. Incidentally, the labeled tree we employed in this construction turns out to be 
the derivation tree of Gf. 

Figures 2-3 illustrate the results of Encoding Step 1 for the binary trees in Fig. 1. 
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Fig. 2: Encoding Step 1 For Left Figure 1 tree 
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Fig. 3: Encoding Step 1 For Right Figure 1 tree 



C. Encoding Step 2 

Fix t eT. We now explain Step 2 of the encoding of t which is to obtain from the grammar 
Gf a string B{Gt) E ® which is taken as the codeword <^e{t) of t. We will be employing two 
sequences S{t) and Si{t) defined as follows: 



• Let A^ = N{t). For each / = 0, ■■■,// — 2, let ordered pair {a2i+i , «2i+2) be the right member 
of the production rule of G? whose left member is i. Then S{t) is the sequence of length 
2A^- 2 defined by 

S{t) = {ai,a2,- ■ ■ ,a2N-3,a2N-2)- 

The alphabet of S{t) is A{t) = {1,2, ■■ ■,A^-2} U {r}. Note that G? is fully recoverable 
from S{t). 

• ^i (t) is the sequence of length A^ remaining after one deletes from S{t) the first left-to- 
right appearance in S{t) of each member of the set { 1 , 2, ■ ■ ■ , A^ — 2}. 

Note that A^ = N{t) = 2 if and only if t is the unique tree in 'T2; in this case, Gt has only 
one production rule -^ {TJ), and S{t) = Si(t) = {TJ). UN = 2, define B{Gt) = 1. Now 
assume N > 2. The codeword 5(Gf) will be obtained via processing of the sequence S{t). 
Note that S{t) partitions into the two subsequences Si{t) (defined previously) and 52(0 = 
(1,2, •■ ■,A^ — 2). For each aEA{t), define fa to be the positive integer 

fa = card{ 1 <i<2N — 2:ai = a}, 

that is, {fa : a EA{t)) is the un-normalized first-order empirical distribution of S{t). Let Si{t) 
be the set of all possible permutations of Si{t); the cardinality of Si{t) is then computable as 

A^! 

card(5i(0) = KT^ • 

/r!nf=f(/-l)! 

B{Gt) is defined to be the left-to-right concatenation of the binary strings 5i, 52,^3,54 
obtained as follows: 

• 5i is the binary string of length A^— 1 consisting of N — 2 zeroes followed by 1. 

• B2 is the binary string of length 2A'^ — 2 in which there are exactly A^ — 2 entries equal 
to 1, where these entries correspond to the first left-to-right appearances in S{t) of the 
members of the set {1,2, ■■ ■,A'^ — 2}. Given 82, one can reconstruct S{t) from its two 
subsequences Si{t) and S2{t). 

• 53 is the binary string consisting of A'^ — 1 alternate runs of ones and zeroes, where the 
lengths of the runs (left-to-right) are taken to be /i,/2, ■■ ■5/yv-2, 1, respectively. Since 
/r > 1, 53 is of length less than 2N — 2. 

• Let M{t) = [log2card(5i(0)l- If M{t) = 0, B4 is the empty string. Otherwise, list all 
members of Si{t) in the lexicographical ordering resulting from the ordering 1,---,A'^ — 
2, T of the alphabet A (?). Assign each member of the list an index, starting with index 0. 
Let / be the index of Si{t) in this list. B4 is the length M{t) binary expansion of integer 
/. 

Verification of Prefix Condition. Suppose t E"! has been processed by the encoding map ^^ 
to yield codeword (^e{t) — B{Gt). Step 1 of the decoding map (|)^ is to determine the grammar 
Gt from B{Gt). More generally, we discuss here how S{t) and hence Gt is recoverable from 
any binary word w of which codeword B{Gt) = BiB2B^B4 is a prefix; this will establish that 
the encoding map (|)e : T — )■ !B satisfies the prefix condition. Scanning w left-to-right to find 
the first 1, one determines B\ and A'^ = N{t) . B2 is then determined from the fact that its length 
is 2N — 2, and then B3 is determined from the fact that it consists of A'^— 1 runs. Knowledge 
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of 53 allows one to determine the set Si{t) and to compute M{t), the length of B4, whence 
B4 can be extracted from w. From B4, one is able to locate Si{t) in the list of the members 
of Si{t). Using B2, one is able to put together S{t) from Si{t) and 52(0- 

D. Encoding/Decoding Examples 

We present two examples. Example 4 illustrates how the encoding map ^e works, and 
Example 5 illustrates how the decoding map (|)^ works. 

Example 4. Let t be the tree on the right in Fig. 1 . Fig. 3 illustrates the results of Step 1 
of encoding map ^e- We then obtain 

N = N{t) = %, 

5(0 = (i,2,3,4,3,r,5,4,r,6,6,r,r,r), 
5i(0 = (3,r,4,r,6,r,r,r), 

52(0 = (1,2,3,4,5,6), 

/i = /2 = /s = 1, /s = /4 = /e = 2, /r = 5, 

5i= 0000001, 

52 = 11110010010000, 

53 = 1011001001. 

We now list the 8!/5! = 336 members of S\{t) in lexicographical order until S\{t) is obtained: 



index 


sequence 


index 


sequence 





(3,4,6,r,r,r,r,r) 


7 


(3,6,r,4,r,r,r,r) 


1 


(3,4,r,6,r,r,r,r) 


8 


(3,6,r,r,4,r,r,r) 


2 


(3,4,r,r,6,r,r,r) 


9 


(3,6,r,r,r,4,r,r) 


3 


(3,4,r,r,r,6,r,r) 


10 


(3,6,r,r,r,r,4,r) 


4 


(3,4,r,r,r,r,6.r) 


11 


(3,6,r,r,r,r,r,4) 


5 


(3,4,r,r,r,r,r,6) 


12 


(3,r,4,6,r,r,r,r) 


6 


(3,6,4,r,r,r,r,r) 


13 


(3,r,4,r,6,r,r,r) 



The index of 5'i(?) is thus / = 13. (Alternatively, one can use the method of Cover [4] to 
compute / directly without forming the above list.) To obtain B4, we expand the index 7=13 
into its [log2 336] =9 bit binary expansion, which yields 

54 = 000001101. 

The codeword ^e{t) = B1B2B2B4 is of length 7 + 14 + 10 + 9 = 40. 
Example 5. Let binary tree t E T he such that 

(^,(0 =5(G,) =00011101000010011000001. 

We employ the decoding map ^^ to find t from B{Gt). In Decoding Step 1, the grammar 
Gt must be determined, which, as remarked earlier, is equivalent to finding the sequence 
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S{t). B{Gt) =B\B2B^B^ must be parsed its constituent parts 51,52,^3, 54. 5i is the unique 
prefix of 5(Gf) belonging to the set {1,01,001,001,0001, ■■ ■}, whence 5i =0001, and hence 
A^ = N{t) =4+1 = 5. Thus, S{t) and B2 are both of length IN -! = '&, whence 

52 = 11010000 

and S{t) is of the form 

S(t) = [ai,a2,cij,,a4,as,ci(,,a'j,a2,). 

The positions of symbol 1 in B2 tell us that 

S2{t) = (ai,a2,«4) = (1,2,3), 
and therefore Si{t) is made up of the remaining entries in S{t), giving us 

Si{t) = (a3,(35,a6,(37,a8)- 

Since ^3 consists of A^— 1 =4 runs of ones and zeroes, with the last run of length 1, we 
must have 

53 = 100110. 

The alphabet of S{t) is {1,2, •■■,A^ — 2, T} = {1,2, 3, T}, and so from 53 the frequencies of 
1,2,3 in S{t) are the lengths of the first three runs in 53, respectively, whence 

/l = l, /2 = 2, /3 = 2. 

The remaining entries of S{t) are all equal to T, giving us /^ = 8 — (1 + 2 + 2) = 3. It follows 
that 5i (t) consists of /i — 1 = entries equal to 1, /2 — 1 = 1 entry equal to 2, /3 — 1 = 1 entry 
equal to 3, and fr = ^ entries equal to T. Consequently, Si{t) is the set of all permutations of 
(2, 3,r, r, r). The cardinality of this set is 5!/3! = 20, and so B4 is of length [log2 20] = 5. 
This checks with what is left of 5(G/) = B\B2Bt,B4 after 5i,52,53 are removed, namely 

54 = 00001. 

The index of 5i {t) in the list of the members of S\ (t) is thus 1=1. This list starts with 
(2, 3,r, r, r), which has index 0, and the sequence following this must therefore by Si{t). 
We conclude that 

5i(0 = (2,r,3,r,r). 

Si{t) and 52(0 now both being known, we put them together to obtain 

5(0 = (i,2,2,3,r,3,r,r). 

Partitioning S{t) into blocks of length two, we obtain the four production rules of G/ in Fig. 
3, whereupon Gt is determined, completing Decoding Step 1. In Decoding Step 2, one grows 
the derivation tree of G; from the production rules of G; as explained in the Introduction, 
giving us the derivation tree in Fig. 3; stripping the labels from this tree, we obtain the binary 
tree t on the left in Fig. 1, completing Decoding Step 2. 
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E. Performance Bound 

We present Theorem 1, which gives us an upper bound on the lengths of the binary 
codewords assigned by the encoding map (|)e which shall be useful in later sections. Theorem 1 
uses the notion of the first order empirical probability distribution of a sequence {s\,S2t- ■■ ,Sn) 
whose entries are selected from a finite alphabet A, which is the probability distribution 
P = {Pa '-a eA) defined by 

Pa = n^ card{l < i <n : Si = a}, a eA. 
The Shannon entropy H{p) of this first order empirical distribution p is defined as 

H{p) = J^-P«l0g2Pfl, 

aeA 

which is also expressible as 

n 

Hip)=n-^Y,-'^'^S2Psi- 

Theorem 1. Let t be any binary tree in T. Let pt be the first order empirical probability 
distribution of the sequence ^i {t) . Then 

L[^eit)] < SiNit) - 1) +Nit)H{p,). (2.5) 

Proof. Let A^ = N{t) . We have A^ > 2. If A^ = 2, then t is the unique tree in % and L[(|)e(0] = 
1, whence (|2.5I) holds because the right side is 5. Assume N > 2. Recall that Si{t) is the set 
of all permutations of S{{t). From the relationships 

4 

LMt)] = Y,m] = 3{N- 1) +L[53] + riog2(card(5i(0))l, 

i=l 

L[B3]<2N-3, 
riog2(card(^i(f)))l < log2(card(5i(f))) + 1, 

we obtain 

^[^.(0] <5(A^-l)+log2(card(5lW))• 
Since Si{t) is a type class of sequences of length N in the sense of Chapter 2 of (5], Lemma 
2.3 of [|5| tells us that 

\og2{card{Si{t)))<NH{pt). 

Inequality (12.51) is now evident. 
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F. Minimality/Uniqueness of G/ 

Given t & T, we discuss what distinguishes Gt among the possibly many deterministic 
context-free grammars which form a representation of t. First, we explain what it means for 
a directed acyclic graph (DAG) to be a representation of t. Let D be a finite rooted DAG with 
at least two vertices such that each non-leaf vertex has exactly two ordered edges. Define 
G{D) to be the deterministic context-free grammar whose set of nonterminal variables is the 
set of non-leaf vertices of D, whose set of terminal variables is the set of leaf vertices of 
D, whose start variable is the root vertex of D, and whose production rules are all the rules 
of the form v — )■ (vi,V2) in which v is a non-leaf vertex of D, and vi,V2 are the respective 
vertices of D at the terminus of the edges 1,2 emanating from v. Then we say that D is a 
representation of t E T if the grammar G(D) forms a representation of t. It is known that 
each binary tree in T has a unique DAG representation up to isomorphism with the minimal 
number of vertices [iT4l : we call this DAG the minimal DAG representation of the binary tree. 
One particular choice of minimal DAG representation of f G T is the DAG D*{t) defined as 
follows. The set of vertices of D*{t) is {t{v) : v G V{t)}. The root vertex of D*{t) is t, and t* 
is the unique leaf vertex of D*{t). If m is a non-leaf vertex of D*{t), then there are exactly two 
ordered edges emanating from u, edge 1 terminating at u^ and edge 2 terminating at ur. Note 
that the number of vertices of the minimal DAG representation D*{t) of t is N{t), which 
coincides with the number of variables of Gf (Recall that the complete set of variables 
of Gt is {0, 1,- ■■,A^(?) — 2} U {r}, of cardinality N{t).) The paper [2J gives a linear-time 
algorithm for computing D*{t). Fig. 4 illustrates a binary tree together with its minimal DAG 
representation. 

Lemma 1. Let t E'T. Then Gt has the smallest number of variables among all deterministic 
context-free grammars which form a representation of t. 

Proof. Let G be a deterministic context-free grammar which forms a representation of t. 
The proof consists in showing that the number of variables of G is at least N{t), the number 
of variables of G^. In the following, we explain how to extract from the derivation tree f(G) 
of G a rooted ordered DAG D(t) which is a representation of t. The set of vertices of D{t) 
is the set of labels on the vertices of f(G). The root vertex of D{t) is the label on the root 
vertex of t{G), the set of non-leaf vertices of D{t) is the set of labels on the non-leaf vertices 
of t{G), and the set of leaf vertices of D{t) is the set of labels on the leaf vertices of t{G). 
Let s be any non-leaf vertex of D{t). Find a vertex v of t{G) whose label is s, and let si,S2 
be the respective labels on the ordered children of v in t{G); the pair {si,S2) thus derived 
will be the same no matter which vertex v of t{G) with label s is chosen. There are exactly 
two ordered edges of D{t) emanating from s, namely, edge 1 which terminates at ^i and edge 
2 which terminates at S2- This completes the specification of the DAG D{t). By construction 
of D{t), the number of variables of G is at least as much as the number of vertices of D{t). 
Since D{t) is a DAG representation of t, the number of vertices of D{t) is at least as much 
as the number of vertices N{t) of the minimal DAG representation of t. Thus, the number of 
variables of G is at least N{t), completing the proof. 

Remark. With some more work, one can show that any deterministic context-free grammar 
which forms a representation oftE'T and has the same number of variables as G^ must be 
isomorphic to Gt, using the known fact mentioned earlier that the minimal DAG representation 
of t is unique up to isomorphism. This gives us a sense in which G^ is unique. 
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Fig. 4: A binary tree (left) and its minimal DAG representation (right) 



III. Sources For Which {^eAd) is Asymptotically Optimal 

This section examines the asymptotic performance of the code ((|)e,(|)j) on a binary tree 
source. We put forth weak sufficient conditions on a binary tree source so that our two-step 
grammar-based code {(^eAd) will be an asymptotically optimal code for the source. Before 
doing that, we need to first establish a lemma giving an asymptotic average redundancy lower 
bound for general structure sources. 

Suppose (Q,^,/') be an arbitrary structure source. Let (\|/e,\|/j) be a lossless code on Q., 
and let F G ^ be such that every structure (O G F is of the same size. The well-known entropy 
lower bound for prefix codes tells us that 



£ L[yVeH]Pi(D) 
coeF 



> 



£ -P{(0)log2P{(0) 

coeF,P(a))>0 



from which it follows that 



i?(V„F,P)>0, 



that is, the F-th order average redundancy of the code with respect to the source is non- 
negative. Although this redundancy non-negativity property fails for a general structure source, 
the following result gives us an asymptotic sense in which average redundancy is non-negative. 
Lemma 2. Let {Q.,0^,P) be a general structure source. Then 



liminf7?(\i/e 



F,P)>0 



(3.6) 



for any lossless code (\|/e,\|/^) on Q.. 

Proof. Fix a general structure source (Q, !F,P). Let Q be the set of all 2 : Q — )> (0, 1) such 
that the restriction of Q to each F G ^ is a probability distribution on F. In the first part of 
the proof, we show that 



liminf£|(o| ^P((o)log2 



Fe!F 



coeF 



e(w) 



>o, ee 



(3.7) 
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where in (13.71 ) and henceforth, any expected value of the form L{oeF,?(w)^(w) is computed by 
summing only over those (D E F for which /'(co) > 0. The proof of (|3.7I) exploits the concept 
of divergence. If p = {pj : j eA) and q = {qj : j eA) are any two probability distributions on 
a finite set A, with all qj probabilities > 0, we let D{p\q) denote the divergence of p with 
respect to q, defined by 

D{p\q) = l,pAogJ^ 
jeA V^J. 

It is well-known that D{p\q) > [El. Fix an arbitrary QeQ. Given F E !F, let If = {\(o\:(OE 
F}, and for each i E If, let Ft = {& E F : \(o\ = i}. Furthermore, let Pf.Qf be the probability 
distributions on If such that 

PF{i)=P{F,), iElF, 
QF{l) = Q{Fi), iElF, 

and for each i E If, let P'f.Q'f be probability distributions on F, such that 

P((o)=Pf(/)4((o), «ei^-, 

<2((o) = eF(0eM«), weF,. 
It is easy to show that 

£ |cori/'(a))iog2 (^) = £ r'PFimp'.m + £ r'PF{i)iog, (^) , 

and therefore 

Mi) 
.Qpii. 



£ |co|-ip(co)iogj^) > £ripH0iog2 



Let Ep.Eq be the expected values defined by 

El^Y^i-'PFii), 

E^^l^r'QFii). 

ieip 

Note that E^ and F^ both belong to the interval (0, 1]. Let Pp,Q*p be the probability distri- 
butions on If defined by 

ppii) = r'PF{i)/E^, iEiF, 

QUi) = r'QFii)/E^, iElF. 
Then we have 

£ i-'PF{i)\0g2 (1^) = F^D(P;|e* ) +£^l0g2(l/4) +E^p\og^El 
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Since ^/Eq > 1, the first two terms on the right side of the preceding equality are non-negative, 
whence 

liminfy |a)ri/'((o)log2 f -^ ) > liminfE^logo^p. (3.8) 



coeF 
Note that 



0<E^ < 



1 



min{|(o| : (OGF}' 

and so by (11.21) 

lim E^ - 0, (3.9) 

FeT 

the right side of (13.81) is zero, and (13.71 ) holds. To finish the proof, let (\|/e,\|/^/) be any lossless 
code on Q.. By Kraft's inequality for prefix codes, there exists Q eQ such that 

^[Ve(«)] > -log2 2(w), (OG a, 

and hence 

R{x\fe,F,P)= £ |a)ri{L[(^,(a))]+log2i'(a))}i'(a))> £ |a)rip(a))log2 f^ 
(oeF coeF Vi^l"^J 

(13.61) then follows by appealing to (|3.7I) . 

Remark. In view of Lemma 2, given a general structure source {Q., !f,P), a lossless code 
(\|/e,\|/j) on ^ is an asymptotically optimal code for the source if and only if 

limsup7?(\|/e,F,P) <0. (3.10) 

FeT 

We now tum our attention to properties of a binary tree source under which the grammar- 
based code (^eAd) on T will be asymptotically optimal for the source. There are two of 
these properties, the Domination Property and the Representation Ratio Negligibility Property, 
which are discussed in the following. 

Domination Property. We define A to be the set of all mappings X : T* — )■ (0, 1] such that 

. ia):l{t)<l{tL)HtR), teT. 

• (b): There exists a positive integer K{X) such that 

1< £X(0<n^^^\ n>l. (3.11) 

An element ?i of A dominates a binary tree source ("T, !F,P) if P{t) < X{t) for all ? G T. A 
binary tree source satisfies the Domination Property if there exists an element of A which 
dominates the source. 

Representation Ratio Negligibility Property. Let t E T. We define the representation ratio 
of t, denoted r{t), to be the ratio between the number of variables of the grammar Gt and 
the number of leaves of t. That is, r{t) = N(t)/\t\. Since 

N{t) = card{z'(v) : v G V{t)} = 1 +card{t{v) : v G V^{t)} < 1 + {\t\ - 1) = \t\, 

the representation ratio is at most 1. In the main result of this section. Theorem 2, we 
will see that our ability to compress t E T via the code ({|)e,{|)rf) becomes greater as r{t) 
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becomes smaller. We say that a binary tree source (T, ^,P) obeys the Representation Ratio 
Negligibility Property (RRN Property) if 

limY,r{t)P{t) = 0. (3.12) 

Definition. Henceforth, y: [0, 1] — )■ [0,°°) is the function defined by 

y(x)^\ -(x/2)l0g2(^/2), X>0 

Theorem 2. The following statements hold: 
(a): For each Xe A, 

\t\-' {L[Ut)] +log2Mt)} < {2K(k) + 10)Y(r(0), t G T. (3.13) 

(b): Let (T, ^jP) be a binary tree source satisfying the Domination Property, where 
^ can be any T-filter. There exists a positive real number C, depending only on 
the source, such that 

i?((^„F,P)<CY|£r(0P(0], F^9^. (3.14) 

(c): {^eAd) is an asymptotically optimal code for any binary tree source which 
satisfies both the Domination Property and the RRN Property. 
Proof. It suffices to prove part (a). (Part (b) follows from part (a) and the fact that y is a 

concave function; part(c) follows from part(b) and (13.101 ).) Let X G A be arbitrary. Fix t E^^ 

and let A^ = N{t) . There is an initial binary subtree t"^ of t such that 

• There are A^^ leaf vertices of ?^. 

• The subtrees t{v) are distinct as v ranges through the A^— 1 non-leaf vertices of f^. 
(One can obtain ?^ either by pruning the derivation tree of G/ or by growing it using the 
production rules of G/ so that in the growth process each production rule is used to extend a 
leaf exactly once; see Fig. 5.) Let vi,V2, ■ " " 5 ^A' be an enumeration of the leaves of ?^. There 
is a one-to-one correspondence between the set {t{v) : v G V{t)} and the set of variables 
of Gf, and under this correspondence, the sequence s* = {t{vi)j{v2),--j{vf^)) is carried 
into a sequence which is a permutation of the sequence 5i(f), and the first order empirical 
distribution p* of s* is carried into the first order empirical distribution pt of 5i (?) . Thus, the 
Shannon entropies H{p*), H{pt) coincide, and appealing to Theorem 1, we have 

L[Ut)] < 5iN-l) + f^-\og2P%t{vi)). 

i=l 

Define 

There is a unique real number D > 1/2 such that 

q{u)=DMJ^\u\-^'k{u), ueTj, j>l (3.15) 
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defines a probability distribution on T* . Shannon's Inequality ([T|, page 37) then gives us 

N N 

£-log2;?*(Kv,)) < £-log2^(f(v,0). 

i=\ 1=1 

Using formula (|3.15l ) and the fact that — log2D < 1, we obtain 

N 

£-log2^(?(v,)) = N{-\og2D) + Qi+2Q2 + Q3 



< N+Q1+2Q2 + Q3, 



where 



N 

Q\ = El0g2^k(v,-)h 
(=1 

N 

Qi = Eiog2k(v.)l, 

A' 

23 = -£l0g2X(Kv,)). 
(=1 

We bound each of these quantities in turn. By (|3.11l) . we obtain 

Qi<K{X)Q2. 
By concavity of the logarithm function, and recalling that r(t) =N /\t\, we have 

22 < A^log2 (^^^^) = A^log2(kl/A^) = 2k|Y(r(0) -N. 
By property (a) for membership of X in A, we have 

e3<-iog2M0- 

Combining previous bounds, and writing K = K{X), we see that 

L[(^,(0]+log2M0 < 6A^-(i^ + 2)A^ + 2(i^ + 2)|f|y(r(0) 
< 'i\t\r{t)+2{K+2)\t\j{r{t)) 

holds, whence (13.131 ) holds because r{t) < 2y{r{t)), completing the proof of part (a) of 
Theorem 2. 



19 




Fig. 5: Initial subtree of Fig. 3 derivation tree used in Theorem 2 proof 



IV. Universal Coding of Leaf-Centric Binary Tree Sources 

We fix throughout this section the CT-filter !fi = {% : n >2}. We now formally define 
the set of leaf-centric binary tree sources, which are certain binary tree sources of the form 
(T, !Fi,P)- Let N be the set of positive integers, and let Ei be the set of all functions o from 
N X N into [0, 1] such that 



{(i,j):i,j>Li+j=n} 



1, n>2. 



For each a G Ei, let P^ be the mapping from T into [0, 1] such that 

Pa{t)= n ^(k(v)L|,k(v)/?l), ter. 



Since 



I Pa{t) 



1, n>2, 



S{o) = {1,0^1, Pa) is a binary tree source. The sources in the family {5(a) : a G Ei} are 
called leaf-centric binary tree sources, the reason being that the probability of each tree is 
computed based purely upon the number of leaves in each of its final subtrees. Leaf-centric 
binary tree sources were first considered in the paper [12]. 

Example 6. Let E| be the subset of Ei consisting of all o G Ei for which 



{{ij):ij>l, i + j = n, o(/,j)>0}c{(l,n-l),(n-l,l)}, n>2. 



^t 



If o G E|, then a tree t E T with positive Pa probability must satisfy the property that there 
exist only two vertices of t at each depth level of t beyond level 0; we call such a binary tree 
a one-dimensional tree. Consider the structure universe of binary strings "B, in which the size 
of a string b E "B is taken to be its length L[b]. For each n > 1, let !S„ be the set of strings in B 
of length n, and let ^{'B)he the (B-filter {!B„ : n > 1}. Let [0, 1]°° be the set of all sequences 
a = (a,- : / > 1) in which each a,- belongs to the interval [0, 1], and for each a G [0, 1]°°, let 
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((B, ^((B),2a) be the one-dimensional source in which for each string bib2---bn belonging 
to "B we have 

n 

Qa{bib2---bn) =n^(^''^')' 

where q{<Xi,bi) is taken to a, if bi = and taken to be 1 — a,, otherwise. It is easy to see that 
the family of sources {(T, ^i,/a) : cr G Ej} has a universal code if and only if the family of 
one-dimensional sources {{'B,J^{'B),Qa) : a G [0, 1]°°} has a universal code. The third author 
has shown that this latter family of one-dimensional sources has no universal code. Therefore, 
the family {5(o) :(5 ^IL^} has no universal code, and so the bigger family of all leaf-centric 
binary tree sources also has no universal code. 

The following result shows that {(^e^^d) is a universal code for a suitably restricted sub- 
family of the family of leaf-centric binary tree sources. 

Theorem 3. Let Ej be the uncountable set consisting of all o G Ei such that 

supj .'^■^. :/,j>l, a(/,7)>oj<°o. (4.16) 

Then {^eAd) is a universal code for the family of sources {5(a) : o G E]"}. 

Before proceeding with the proof of Theorem 3, we provide an example of a source in 
{5(a):oGEt}. 

Example 7. Given a general structure source {Q., 9- ,P), then for each F G ^, the F-ih. order 
entropy of the source is defined by 

Hf{P)= £ -|(or^P(co)log2/'(a)). 
coeF 

\mip(^jrHf{P) is defined to be the entropy rate of the source, if the limit exists; otherwise, 
the source has no entropy rate. In universal source coding theory for families of classical one- 
dimensional sources (see Ex. 3), the sources are typically assumed to be stationary sources or 
finite-state sources, which are types of sources which have an entropy rate. In the universal 
coding of binary tree sources, however, one very often deals with sources which have no 
entropy rate. We illustrate a particular source of this type in the family {5(o) : o G Ej}. Let 
a G E| be the function such that for each even n>2, 

a(n/2,n/2) = l, 

and for each odd n>3, 

a{[n/2\,\n/2])=a{\n/2l[n/2\) = l/2. 

The resulting leaf-centric binary tree source S{(5), introduced in [fT2] . is called the bisection 
tree source model. In [9], it is shown that there is a unique nonconstant continuous periodic 
function / : M — )■ [0, 1], with period 1, such that 

-Iog2/'a(0 = kl/aog2kl), teT, (4.17) 

and the restriction of / to [0, 1] is characterized as the attractor of a specific iterated function 
system on [0, 1]; because of this property, the source 5(o) has no entropy rate. 
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Proof of Theorem 3. If o G Ei, let X : T* -)■ [0, 1] be the function such that X{t*) = 1 and 

X{t)=ma.x{K-\Pa{t)), t e %, n>2. 

Then X E A and X dominates Pa- Thus, every source in the family {5(a) : o G Ej} satisfies 
the Domination Property. By Theorem 2, our proof will be complete once it is shown that 
every source in this family satisfies the RRN Property. More generally, we show that the RRN 
Property holds for any binary tree source (T, ^,P) for which 

k(v)i 



sup < max 

t<ET,P{t)>Q lvev'(0 



_m:m{\t{y)Ll\t{y)R\)_ 



<oo. (4.18) 



(The "T-filter ^ in the given source {T,!f,P) need not be equal to !fi.) Let C be a positive 
integer greater than or equal to the supremum on the left side of (14.181) . Fix t eT for which 
P{t) > 0. As in the proof of Theorem 2, let f ^ be an initial binary subtree of t with A^ = N{t) 
leaves such that {t{v) : v G V^{t^)} = {t{v) : v G V^{t)}. Let vi,V2, ■■ -jVa? be an enumeration 
of the leaves of f ^ and for each / = 1 , 2, ■ ■ ■ , A^, let ut G V^ (f ^) be the parent vertex of v,. We 
have 

\t(Ui)\ 

LlJIl < c / = 1 2 ■ ■ ■ A^ 

\t{Vi)\ 

and therefore 

\t{ui)\ + \t{u2)\+--- + \t{uN)\ ^^ 

k(vi)i + k(v2)i + --- + kMI - 

The sum in the denominator is \t\, and so 

k(Mi)l + kMi+---+kMI < ^ (4 j9) 

Each u G {mi, ■ ■ ■ ,ma/} can be the parent of at most two elements of the set {vi, ■ ■ ■ ,va?}, and 
so 

card({Mi,---,a^})>(l/2)card({vi,---,v^})=A^/2. 

The mapping u -^ t{u) from the set V^?^) into the set {t{v) : v G V^{t)} is a one-to-one onto 
mapping (both sets have cardinality N —I). Therefore, 

card({/'(Mi),/'(M2),---,K"w)}) >N/2. (4.20) 

Let A:= \N/1\. We conclude from (|4.19I )- (I4.20I) that there are k distinct trees ti,t2,- ■■ ,tk m 
T whose total number of leaves is < \t\C, where we suppose that these k trees have been 
enumerated so that 

kil<k2l<---<kfcl- 

Let ?(1),?(2),?(3), ■ ■ ■ be an enumeration of all trees in T such that ^^(1) is the unique tree in 
%., t{2),t{?>) are the two trees in %, r(4),r(5),?(6),?(7),f(8) are the five trees in %, and so 
forth. We clearly have \t{i)\ < \ti\ for /= l.---,k. Therefore, 

k(i)i + k(2)i + ---+kWI<klc. (4.21) 
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The sequence m, = \t{i)\ can be characterized as the sequence in which m\=2 and for each 
i > 3, nii = j for all integers i satisfying 

K2 + K3 + -- +Kj-i <i<K2 + K3+-+Kj. 

Define 

k{M) = max{fc> 1 :mi+m2H \-mk<M}, M>2. 

Since the sequence {Kj : j > 2} grows exponentially fast, it follows that k{M)/M = 0{l/\og2M) 
by an argument similar to an argument on page 753 of [TO], and hence 

\imk{M)/M = 0. (4.22) 

From (|4.21l) . we have shown that 

\N{t)/2]<k{\t\C)), ter, P{t)>0. 
Dividing both sides by |?| and summing, we then have 

£ r{t)P{t) < 2 £ \t\-'k{\t\C))P{t), Fef. (4.23) 

Let Hf = min{|f I : t E F}, and define 

6{J) = sup{kU)/j:j>J}, J>2. 
From (|4.23l) . we then have 

£K0^(0<2C6(n^C), Fg^. (4.24) 

By (11.11) . limf^^Hf = oo, and we also have \imj^^d{J) = 0. Taking the limit along filter ^ 
on both sides of (14.241) . we then obtain (|3.12l) . which is the RRN Property for the source 

('r,j,/'). 

V. Universal Coding of Depth-Centric Binary Tree Sources 

For each t E T*, define d{t) to be the depth of t, which is the number of edges in the 
longest root-to-leaf path in t. We have d{t*) = and as defined in Ex. 2, for each n > 1 
we let T" be the set of trees {t E T : d{t) = n}. We fix throughout this section the T- 
filter ^2 = {'I'" : n > 1}. We now formally define the set of depth-centric binary tree sources, 
which are certain binary tree sources of the form (T, ^27^)- Let Z+ be the set of nonnegative 
integers, and let E2 be the set of all functions a from Z+ x Z+ into [0, 1] such that 

£ o{i,j) = l, n>l. 

For each a G E2, let Pa be the mapping from T into [0, 1] such that 

Pc{t)= n o{d{t{v)L),d{t{v)R)), tET. 
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Since 

5(a) = (T, ^25^a) is a binary tree source. The sources in the family {5(o) : o G E2} are 
called depth-centric binary tree sources, the reason being that the probability of each tree is 
based purely upon the depths of its final subtrees. 

Example 8. Let Ej be the subset of E2 consisting of all o G E2 for which 

{(/,j):/,j>0, max(/,j)=n-l,o(?,j)>0}c{(0,n-l),(n-l,0)}, n>\. 

If a G Ej, then a tree t &TI' has positive Pa probability if and only if ? is a one-dimensional 
tree. The family of sources {5(o) : o G Ej} has no universal code by the same argument given 
in Ex. 6. Thus, the bigger family of all depth-centric binary tree sources also has no universal 
code. 

Our final result shows that {^e,^d) is a universal code for a suitably restricted subfamily 
of the family of depth-centric binary tree sources. 

Theorem 4. Let Ej be the uncountable set consisting of all o G E2 such that 

sup{|/- j| : ij > 0, o{iJ) > 0} < 00 (5.25) 

and 

card{|/ — j| :?, J > 0, max(/,j) = n— 1, a(?',j) > 0} = 1, n>l. (5.26) 

Then {^eAd) is a universal code for the family of sources {S(o) : o G Ej}. 

Proof. Each source in the family {5(o) : o G Ej} satisfies the Domination Property, by the 
same argument given in the proof of Theorem 3. Appealing to Theorem 2, our proof will be 
complete once we verify that each source in this family also satisfies the RRN Property. Fix 
the source 5(o), where o G E|. By the last part of the proof of Theorem 3, 5(o) will satisfy 
the RRN Property if 

< 0°. (5.27) 



sup < max ■ n r \ \ \ r \ w 

rer.Pa{t)>0 IvevHO l^m\nv)L\,\t{v)R\) _ 

By (15.261) . for each n> I, there exists ^„ G {0, 1 , ■ ■ ■ , n — 1 } such that 

{{ij) ■ij>0, max(?,j)=n-l, <5{i,j) > 0} C {(^„,n- 1), (n- l,fc„)}. 

Let {x{n) : n > 0) be the sequence of real numbers such that jc(0) = 1 and 

x{n) = x{n — \) + x{kn) , n>\. 

We prove the statement 

\t\=x{d{t)), te{t*}U{t'eT:Pa{t')>0} (5.28) 

by induction on \t\, starting with \t\ = 1. If \t\ — 1, then t = t* and \t\ = x{d{t)) is the true 
statement 1 = x{0). Now fix a G T for which Pa{u) > and we assume as our induction 
hypothesis that \t\ =x{d{t)) holds for every t e {t*}U{t' e T : P^it') > 0} for which \t\ < \u\. 
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Note that {d{ui),d{uR)) belongs to the set {{d{u) — l,fc(/(i,)), {ki^i(^^,d{u) — 1)}. The induction 
hypothesis holds for both ui and ur, and so 

|"| = |"l| + \ur\ = x{d{ui)) +x{d{uR)) = x{d{u) — 1) +x{kj^u)) =x{d{u))j 

completing the proof of statement (I5.281 ). We conclude from (15.281 ) that for every ? G T for 
which Pa{t)> 0, 

^^^"^^ e{x{n)/xiK):n>l}, veV^t). 



min{\t{v)L\:\t{v)R\) 

By (15.251 ). let m G Z+ be the supremum on the left side of (15.251 ); then n — I —kn< m for 
n>l. Since the sequence {x{n)) is nondecreasing, x{n)/x{n — 1) < 2 for « > 1, and so 

44= n -^<2"-^«<2'"+i, n>l. 
x{k„) ,X\ix{i-l) 

Thus, the left side of (15.271) is at most 2"^+^ and (15.271 ) holds, completing our proof. 

VI. Conclusions 

We have shown that the grammar-based code {<^eAd) on the set T of binary tree structures 
defined in this paper is asymptotically optimal for any binary tree source satisfying the 
Domination Property and the Representation Ratio Negligibility Property. In typical cases, we 
have found that the Domination Property is easy to verify for a binary tree source, whereas the 
RRN Property is more troublesome to verify. In a subsequent paper [fTTI . we investigate more 
scenarios in which the RRN Property will hold. (The one-dimensional binary trees discussed 
in Example 6 need to be avoided in a binary tree source model, as well as some trees derived 
from these.) In [[TT|. we also show that ((|)e,(|)^) is universal for some families of binary tree 
sources induced by branching processes (including families of sources which were considered 
in ifTSl from an entropy point of view but not from a compression point of view). 
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